SkinnerDB: Regret-bounded Query Evaluation via Reinforcement Learning
نویسندگان
چکیده
SkinnerDB uses reinforcement learning for reliable join ordering, exploiting an adaptive processing engine with specialized algorithms and data structures. It maintains no statistics cost or cardinality models. Also, it training workloads nor does try to link the current query seemingly similar queries in past. Instead, learn optimal orders from scratch during execution of query. To that purpose, divides a into many small time slices. Different are tried different merges result tuples generated according until complete is obtained. By measuring progress per slice, identifies promising as proceeds. Along SkinnerDB, we introduce new quality criterion strategies. We upper-bound expected regret, i.e., amount wasted due sub-optimal order choices. features multiple strategies optimized criterion. Some them can be executed on top existing database systems. For maximal performance, customized engine, facilitating fast switching via multi-way tuple representations. experimentally compare SkinnerDB’s performance against various baselines, including MonetDB, Postgres, methods. consider benchmarks, benchmark, TPC-H, JCC-H, well benchmark variants user-defined functions. Overall, overheads ordering negligible compared impact occasional, catastrophic choice.
منابع مشابه
Reinforcement and Imitation Learning via Interactive No-Regret Learning
Recent work has demonstrated that problems– particularly imitation learning and structured prediction– where a learner’s predictions influence the inputdistribution it is tested on can be naturally addressed by an interactive approach and analyzed using no-regret online learning. These approaches to imitation learning, however, neither require nor benefit from information about the cost of acti...
متن کاملNear-optimal Regret Bounds for Reinforcement Learning Near-optimal Regret Bounds for Reinforcement Learning
This technical report is an extended version of [1]. For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has diameter D if for any pair of states s, s there is a policy which moves from s to s i...
متن کاملMinimax Regret Bounds for Reinforcement Learning
We consider the problem of provably optimal exploration in reinforcement learning for finite horizon MDPs. We show that an optimistic modification to value iteration achieves a regret bound of Õ( √ HSAT+HSA+H √ T ) where H is the time horizon, S the number of states, A the number of actions and T the number of timesteps. This result improves over the best previous known bound Õ(HS √ AT ) achiev...
متن کاملNear-optimal Regret Bounds for Reinforcement Learning
For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has diameter D if for any pair of states s, s′ there is a policy which moves from s to s′ in at most D steps (on average). We present a reinfo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Database Systems
سال: 2021
ISSN: ['1557-4644', '0362-5915']
DOI: https://doi.org/10.1145/3464389